Adapting Text instead of the Model: An Open Domain Approach
نویسندگان
چکیده
Natural language systems trained on labeled data from one domain do not perform well on other domains. Most adaptation algorithms proposed in the literature train a new model for the new domain using unlabeled data. However, it is time consuming to retrain big models or pipeline systems. Moreover, the domain of a new target sentence may not be known, and one may not have significant amount of unlabeled data for every new domain. To pursue the goal of an Open Domain NLP (train once, test anywhere), we propose ADUT (ADaptation Using label-preserving Transformation), an approach that avoids the need for retraining and does not require knowledge of the new domain, or any data from it. Our approach applies simple label-preserving transformations to the target text so that the transformed text is more similar to the training domain; it then applies the existing model on the transformed sentences and combines the predictions to produce the desired prediction on the target text. We instantiate ADUT for the case of Semantic Role Labeling (SRL) and show that it compares favorably with approaches that retrain their model on the target domain. Specifically, this “on the fly” adaptation approach yields 13% error reduction for a single parse system when adapting from the news wire text to fiction.
منابع مشابه
Temperature-dependent model of human cardiac sodium channel
Cardiac sodium channels are integral membrane proteins whose structure is not known at atomic level yet and their molecular kinetics is still being studied through mathematical modeling. This study has focused on adapting an existing model of cardiac Na channel to analyze molecular kinetics of channels at 9-37°C. Irvine et al developed a Markov model for Na channel using Neuronal Network Model ...
متن کاملTemperature-dependent model of human cardiac sodium channel
Cardiac sodium channels are integral membrane proteins whose structure is not known at atomic level yet and their molecular kinetics is still being studied through mathematical modeling. This study has focused on adapting an existing model of cardiac Na channel to analyze molecular kinetics of channels at 9-37°C. Irvine et al developed a Markov model for Na channel using Neuronal Network Model ...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملA genetic algorithm approach for open-pit mine production scheduling
In an Open-Pit Production Scheduling (OPPS) problem, the goal is to determine the mining sequence of an orebody as a block model. In this article, linear programing formulation is used to aim this goal. OPPS problem is known as an NP-hard problem, so an exact mathematical model cannot be applied to solve in the real state. Genetic Algorithm (GA) is a well-known member of evolutionary algorithms...
متن کاملThe Domain of the semantics of ‘promise’ in the Holy Quran
Semantics is a part of linguistic by which it can be analyzed the meaning of the words and sentences of a text and identified the part of speech with regard to semantics. This is a descriptive-analytic research and it deals with studying the meaning of ‘promise’ in the Holy Quran based on principles of semantics with a collocation approach by library methodology. Also, by virtue of ...
متن کامل